{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Automatic Hyperparameter tuning\n",
    "\n",
    "This notebook will show you how to extend the code in the cloud-ml-housing-prices notebook to take advantage of Cloud ML Engine's [automatic hyperparameter tuning](https://cloud.google.com/ml-engine/docs/using-hyperparameter-tuning).\n",
    "\n",
    "We will use it to determine the ideal number of hidden units to use in our neural network.\n",
    "\n",
    "Cloud ML Engine uses bayesian optimization to find the hyperparameter settings for you. You can read the details of how it works [here.](https://cloud.google.com/blog/big-data/2017/08/hyperparameter-tuning-in-cloud-machine-learning-engine-using-bayesian-optimization)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1) Modify Tensorflow Code\n",
    "\n",
    "We need to make code changes to:\n",
    "1. Expose any hyperparameter we wish to tune as a command line argument (this is how CMLE passes new values)\n",
    "2. Modify the output_dir so each hyperparameter 'trial' gets written to a unique directory\n",
    "\n",
    "These changes are illustrated below. Any change from the original code has a **#NEW** comment next to it for easy reference"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "mkdir: cannot create directory ‘trainer’: File exists\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "mkdir trainer\n",
    "touch trainer/__init__.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting trainer/task.py\n"
     ]
    }
   ],
   "source": [
    "%%writefile trainer/task.py\n",
    "\n",
    "import argparse\n",
    "import pandas as pd\n",
    "import tensorflow as tf\n",
    "import os #NEW\n",
    "import json #NEW\n",
    "from tensorflow.contrib.learn.python.learn import learn_runner\n",
    "from tensorflow.contrib.learn.python.learn.utils import saved_model_export_utils\n",
    "\n",
    "print(tf.__version__)\n",
    "tf.logging.set_verbosity(tf.logging.ERROR)\n",
    "\n",
    "data_train = pd.read_csv(\n",
    "  filepath_or_buffer='https://storage.googleapis.com/spls/gsp418/housing_train.csv',\n",
    "  names=[\"CRIM\",\"ZN\",\"INDUS\",\"CHAS\",\"NOX\",\"RM\",\"AGE\",\"DIS\",\"RAD\",\"TAX\",\"PTRATIO\",\"MEDV\"])\n",
    "\n",
    "data_test = pd.read_csv(\n",
    "  filepath_or_buffer='https://storage.googleapis.com/spls/gsp418/housing_test.csv',\n",
    "  names=[\"CRIM\",\"ZN\",\"INDUS\",\"CHAS\",\"NOX\",\"RM\",\"AGE\",\"DIS\",\"RAD\",\"TAX\",\"PTRATIO\",\"MEDV\"])\n",
    "\n",
    "FEATURES = [\"CRIM\", \"ZN\", \"INDUS\", \"NOX\", \"RM\",\n",
    "            \"AGE\", \"DIS\", \"TAX\", \"PTRATIO\"]\n",
    "LABEL = \"MEDV\"\n",
    "\n",
    "feature_cols = [tf.feature_column.numeric_column(k)\n",
    "                  for k in FEATURES] #list of Feature Columns\n",
    "\n",
    "def generate_estimator(output_dir):\n",
    "  return tf.estimator.DNNRegressor(feature_columns=feature_cols, \n",
    "                                            hidden_units=[args.hidden_units_1, args.hidden_units_2], #NEW (use command line parameters for hidden units)\n",
    "                                            model_dir=output_dir)\n",
    "\n",
    "def generate_input_fn(data_set):\n",
    "    def input_fn():\n",
    "      features = {k: tf.constant(data_set[k].values) for k in FEATURES}\n",
    "      labels = tf.constant(data_set[LABEL].values)\n",
    "      return features, labels\n",
    "    return input_fn\n",
    "\n",
    "def serving_input_fn():\n",
    "  #feature_placeholders are what the caller of the predict() method will have to provide\n",
    "  feature_placeholders = {\n",
    "      column.name: tf.placeholder(column.dtype, [None])\n",
    "      for column in feature_cols\n",
    "  }\n",
    "  \n",
    "  #features are what we actually pass to the estimator\n",
    "  features = {\n",
    "    # Inputs are rank 1 so that we can provide scalars to the server\n",
    "    # but Estimator expects rank 2, so we expand dimension\n",
    "    key: tf.expand_dims(tensor, -1)\n",
    "    for key, tensor in feature_placeholders.items()\n",
    "  }\n",
    "  return tf.estimator.export.ServingInputReceiver(\n",
    "    features, feature_placeholders\n",
    "  )\n",
    "\n",
    "train_spec = tf.estimator.TrainSpec(\n",
    "                input_fn=generate_input_fn(data_train),\n",
    "                max_steps=3000)\n",
    "\n",
    "exporter = tf.estimator.LatestExporter('Servo', serving_input_fn)\n",
    "\n",
    "eval_spec=tf.estimator.EvalSpec(\n",
    "            input_fn=generate_input_fn(data_test),\n",
    "            steps=1,\n",
    "            exporters=exporter)\n",
    "\n",
    "######START CLOUD ML ENGINE BOILERPLATE######\n",
    "if __name__ == '__main__':\n",
    "  parser = argparse.ArgumentParser()\n",
    "  # Input Arguments\n",
    "  parser.add_argument(\n",
    "      '--output_dir',\n",
    "      help='GCS location to write checkpoints and export models',\n",
    "      required=True\n",
    "    )\n",
    "  parser.add_argument(\n",
    "        '--job-dir',\n",
    "        help='this model ignores this field, but it is required by gcloud',\n",
    "        default='junk'\n",
    "    )\n",
    "  parser.add_argument(\n",
    "        '--hidden_units_1', #NEW (expose hyperparameter to command line)\n",
    "        help='number of neurons in first hidden layer',\n",
    "        type = int,\n",
    "        default=10\n",
    "    )\n",
    "  parser.add_argument(\n",
    "        '--hidden_units_2', #NEW (expose hyperparameter to command line)\n",
    "        help='number of neurons in second hidden layer',\n",
    "        type = int,\n",
    "        default=10\n",
    "    )\n",
    "  args = parser.parse_args()\n",
    "  arguments = args.__dict__\n",
    "  output_dir = arguments.pop('output_dir')\n",
    "  output_dir = os.path.join(#NEW (give each trial its own output_dir)\n",
    "      output_dir,\n",
    "      json.loads(\n",
    "          os.environ.get('TF_CONFIG', '{}')\n",
    "      ).get('task', {}).get('trial', '')\n",
    "  )\n",
    "######END CLOUD ML ENGINE BOILERPLATE######\n",
    "\n",
    "  #initiate training job\n",
    "  tf.estimator.train_and_evaluate(generate_estimator(output_dir), train_spec, eval_spec)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2) Define Hyperparameter Configuration File\n",
    "\n",
    "Here you specify:\n",
    "\n",
    "1. Which hyperparamters to tune\n",
    "2. The min and max range to search between\n",
    "3. The metric to optimize\n",
    "4. The number of trials to run"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting config.yaml\n"
     ]
    }
   ],
   "source": [
    "%%writefile config.yaml\n",
    "trainingInput:\n",
    "  hyperparameters:\n",
    "    goal: MINIMIZE\n",
    "    hyperparameterMetricTag: average_loss\n",
    "    maxTrials: 5\n",
    "    maxParallelTrials: 1\n",
    "    params:\n",
    "    - parameterName: hidden_units_1\n",
    "      type: INTEGER\n",
    "      minValue: 1\n",
    "      maxValue: 100\n",
    "      scaleType: UNIT_LOG_SCALE\n",
    "    - parameterName: hidden_units_2\n",
    "      type: INTEGER\n",
    "      minValue: 1\n",
    "      maxValue: 100\n",
    "      scaleType: UNIT_LOG_SCALE"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "### 3) Train"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "GCS_BUCKET = 'gs://vijays-sandbox-ml' #CHANGE THIS TO YOUR BUCKET\n",
    "PROJECT = 'vijays-sandbox' #CHANGE THIS TO YOUR PROJECT ID\n",
    "REGION = 'us-central1' #OPTIONALLY CHANGE THIS"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import os\n",
    "os.environ['GCS_BUCKET'] = GCS_BUCKET\n",
    "os.environ['PROJECT'] = PROJECT\n",
    "os.environ['REGION'] = REGION"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Run local\n",
    "It's a best practice to first run locally to check for errors. Note you can ignore the warnings in this case, as long as there are no errors."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1.5.0\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/local/lib/python2.7/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.\n",
      "  from ._conv import register_converters as _register_converters\n",
      "2018-03-13 23:10:57.249216: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "gcloud ml-engine local train \\\n",
    "   --module-name=trainer.task \\\n",
    "   --package-path=trainer \\\n",
    "   -- \\\n",
    "   --output_dir='./output'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Run on cloud (1 cloud ML unit)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Updated property [core/project].\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "gcloud config set project $PROJECT"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "jobId: housing_180313_232321\n",
      "state: QUEUED\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Job [housing_180313_232321] submitted successfully.\n",
      "Your job is still active. You may view the status of your job with the command\n",
      "\n",
      "  $ gcloud ml-engine jobs describe housing_180313_232321\n",
      "\n",
      "or continue streaming the logs with the command\n",
      "\n",
      "  $ gcloud ml-engine jobs stream-logs housing_180313_232321\n"
     ]
    }
   ],
   "source": [
    "%%bash\n",
    "JOBNAME=housing_$(date -u +%y%m%d_%H%M%S)\n",
    "\n",
    "gcloud ml-engine jobs submit training $JOBNAME \\\n",
    "   --region=$REGION \\\n",
    "   --module-name=trainer.task \\\n",
    "   --package-path=./trainer \\\n",
    "   --job-dir=$GCS_BUCKET/$JOBNAME/ \\\n",
    "   --runtime-version 1.4 \\\n",
    "   --config config.yaml \\\n",
    "   -- \\\n",
    "   --output_dir=$GCS_BUCKET/$JOBNAME/output\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4) Inspect Results\n",
    "\n",
    "In cloud console (https://console.cloud.google.com/mlengine/jobs) you will see the output of each trial, which hyperparameters were choosen, and what the resulting loss was. Trials will be shown in order of performance, with the best trial on top."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}